-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement fixes for rstar #52
Conversation
Maybe we could use even just |
What I don't like about these is that if the user forgets |
Pull Request Test Coverage Report for Build 3686560314Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Codecov ReportBase: 94.50% // Head: 94.91% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #52 +/- ##
==========================================
+ Coverage 94.50% 94.91% +0.41%
==========================================
Files 9 10 +1
Lines 619 669 +50
==========================================
+ Hits 585 635 +50
Misses 34 34
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
If we enforce julia> f(; split::Int=2) = split
f (generic function with 1 method)
julia> f()
2
julia> f(; split=4)
4
julia> f(; split=true)
ERROR: TypeError: in keyword argument split, expected Int64, got a value of type Bool
Stacktrace:
[1] top-level scope
@ REPL[4]:1 |
Ah, of course! Then in that case, I prefer |
src/utils.jl
Outdated
if haskey(d, xi) | ||
push!(d[xi], i) | ||
else | ||
d[xi] = [i] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be made more efficient by not looking up the key twice. One could e.g. use
if haskey(d, xi) | |
push!(d[xi], i) | |
else | |
d[xi] = [i] | |
end | |
d_xi = get!(d, xi) do | |
return Int[] | |
end | |
push!(d_xi, i) |
Apart from that, it seems like a function that could exist e.g. in StatsBase (similar to proportionmap
etc.). Did you check that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this would fit in StatsBase, but there currently is no such method (indexmap
is closest). MLUtils has group_indices
, which is equivalent, but the dependency is too heavy.
I found a few threads of people looking for this, e.g. https://discourse.julialang.org/t/is-there-a-function-similar-to-numpy-unique-with-inverse/80949 but with no clear answer.
An alternative would be to stick closer to NumPy's very useful return_inverse=True
approach and return 2 vectors, basically the sorted keys and corresponding values. Either way, this could later be upstreamed to StatsBase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I just want to make sure we use existing functionality. If it doesn't exist yet that's unfortunate but, of course, then we should use our own implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it seems indicatormap
returns the information we are interested in: https://juliastats.org/StatsBase.jl/stable/misc/#StatsBase.indicatormat But maybe it's not the desired output format for our purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's similar, yes, but a little clunky. e.g. here's how we could get the vector of indices:
using SparseArrays
map(first ∘ findnz ∘ sparse, eachslice(indicatormat(x; sparse=true); dims=1))
But I still think it makes more sense to try to upstream the functionality we want, since often something like what we want will be more convenient for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. That seems a bit inconvenient.
end | ||
|
||
""" | ||
split_chain_indices( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there some existing splitting functionality for ess
? Is the plan to merge these eventually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite merge, because there are two different types of splitting we can consider. This approach supports ragged chains and is as a result more complex and doesn't discard any draws (instead dividing the remainder across the earlier splits).
For ess
/rhat
, we don't support ragged chains so would discard draws if necessary to keep them the same length after splitting. This implementation is much simpler and can be done in a non-allocating way with just reshape
and view
on a 3d array. This will be part of #22.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing splitting functionality copy_split!
will go away.
Co-authored-by: David Widmann <[email protected]>
@devmotion I have implemented all of your suggestions The failed integration test with MCMCChains is expected. That particular failure checks whether two chains, each of constant value can be perfectly discriminated by the classifier. With the new default of |
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you!
No problem, thanks for the detailed reviews! |
Ah, it seems I do not have permissions to merge, since the integration test fails. Can you merge? |
Pull Request Test Coverage Report for Build 3689690844Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Fixes #51:
nsplit=2
split_chains=2
(as suggested in Add rank-normalized ESS and other variants #22 (comment)) to control how many chains each individual chain is split into (nsplit=1
split_chains=1
is the old behavior) to check for within-chain convergence.frac
draws for each chain are in the training dataI'm not thrilled with the namensplit
, as it's not terribly descriptive, but I haven't thought of a better one that wasn't quite verbose.As suggested in #51 (comment), we consider these changes non-breaking because they make the defaults consistent with the recommendations in the paper.